Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
1.
Int. j. morphol ; 40(1): 148-156, feb. 2022. ilus, tab
Article in English | LILACS | ID: biblio-1385580

ABSTRACT

SUMMARY: Missing data may occur in every scientific studies. Statistical shape analysis involves methods that use geometric information obtained from objects. The most important input to the use of geometric information in statistical shape analysis is landmarks. Missing data in shape analysis occurs when there is a loss of information about landmark cartesian coordinates. The aim of the study is to propose F approach algorithm for estimating missing landmark coordinates and compare the performance of F approach with generally accepted missing data estimation methods, EM algorithm, PCA based methods such as Bayesian PCA, Nonlinear Estimation by Iterative Partial Least Squares PCA, Inverse non-linear PCA, Probabilistic PCA and regression imputation methods. Landmark counts were taken as 3, 6, 9 and sample sizes were taken as 5, 10, 30, 50, 100 in the simulation study. The data are generated based on multivariate normal distribution with positively defined variance-covariance matrices from isotropic models. In simulation study three different simulation scenarios and simulation based real data are considered with 1000 repetations. The best and the most different result in the performance evaluation according to all sample sizes is the Min (F) criteria of the F approach algorithm proposed in the study. In case of three landmarks which is only the proposed F approach and regression assignment method can be applied, Min (F) criteria give best results.


RESUMEN: Los datos faltantes pueden ocurrir en todos los estudios científicos. El análisis estadístico de formas involucra métodos que utilizan información geométrica obtenida de objetos. La entrada más importante para el uso de información geométrica en el análisis estadístico de formas son los puntos de referencia. Los datos que faltan en el análisis de formas se producen cuando hay una pérdida de información sobre las coordenadas cartesianas históricas. El objetivo del estudio es proponer el algoritmo de enfoque F para estimar las coordenadas de puntos de referencia faltantes y comparar el rendimiento del enfoque F con métodos de estimación de datos faltantes generalmente aceptados, algoritmo EM, métodos basados en PCA como Bayesian PCA, Estimación no lineal por Iterative Partial Least Squares PCA, PCA no lineal inverso, PCA probabilístico y métodos de imputación de regresión. Los recuentos de puntos de referencia se tomaron como 3, 6, 9 y los tamaños de muestra se tomaron como 5, 10, 30, 50, 100 en el estudio de simulación. Los datos se generan en base a una distribución normal multivariada con matrices de varianza-covarianza definidas positivamente a partir de modelos isotrópicos. En el estudio de simulación se consideran tres escenarios de simulación diferentes y se consideran datos reales basados en simulación con 1000 repeticiones. El mejor y más diferente resultado en la evaluación del desempeño según todos los tamaños de muestra es el criterio Min (F) del algoritmo de enfoque F propuesto en el estudio. En el caso de tres puntos de referencia, que es solo el enfoque F propuesto y se puede aplicar el método de asignación de regresión, los criterios Min (F) dan mejores resultados.


Subject(s)
Algorithms , Anatomic Landmarks , Data Interpretation, Statistical , Principal Component Analysis
2.
Rev. bras. estud. popul ; 38: e0139, 2021. tab, graf
Article in Portuguese | LILACS | ID: biblio-1280030

ABSTRACT

Neste artigo, são estimados os diferenciais educacionais de mortalidade de adultos residentes em São Paulo. É realizada uma análise comparativa de estimativas a partir de dados do Censo 2010 e do Sistema de Informação de Mortalidade (SIM) - Datasus e de três formas distintas de mensuração da escolaridade: registrada no SIM; declarada no Censo para o responsável pelo domicílio; e imputada estatisticamente no Censo para indivíduos que morreram. Para as imputações da escolaridade, utilizou-se o método de Dempester (1977), que propõe o uso do algoritmo esperança-maximização (algoritmo E-M) para lidar com dados faltantes. Foram considerados três níveis de escolaridade (baixo, médio e alto) e estimadas as taxas de mortalidade com base em modelos Poisson. Os resultados indicam que a obtenção de escolaridade pode reduzir em até 77% as taxas de mortalidade entre 25 e 59 anos de idade. Além disso, em um país em que a população tem baixa escolaridade, obter ensino médio representa um ganho significativo do ponto de vista da sobrevivência adulta (cerca de 50%). Encontraram-se padrões de mortalidade por escolaridade semelhantes para as estimativas obtidas com dados registrados no SIM e aqueles imputados no Censo 2010. Além disso, a análise sugere que estimativas assumindo a escolaridade do responsável pelo domicílio resultam em diferenciais de mortalidade atípicos, provavelmente distorcidos pela transição de educação no Brasil. Espera-se que o modelo de imputação proposto aqui possa ser utilizado em futuras análises dos dados de mortalidade a partir do Censo 2010.


En este artículo estimamos los diferenciales educativos de la mortalidad de adultos en San Pablo. Ofrecemos un análisis comparativo de estimaciones con base en datos del censo de 2010 y el Sistema de Información de Mortalidad (SIM) - Datasus, y tres formas diferentes de medir la escolaridad: registrada en el SIM, declarada en el censo por el jefe de hogar e imputado estadísticamente en el censo para las personas fallecidas. Para las imputaciones de escolaridad se utilizó el método de Dempester (1977), que propone el uso del algoritmo de maximización de esperanza (algoritmo E-M) para tratar los datos faltantes. Consideramos tres niveles de educación (bajo, medio y alto) y estimamos las tasas de mortalidad con base en los modelos de Poisson. Los resultados indican que la escolarización puede reducir las tasas de mortalidad entre los 25 y 59 años hasta en un 77 %. Además, en un país donde la población tiene bajo nivel de educación, completar la educación secundaria representa una ganancia significativa desde el punto de vista de la supervivencia de los adultos (alrededor del 50%). Encontramos patrones similares de mortalidad por educación para las estimaciones obtenidas con datos registrados en el SIM y datos imputados en el Censo de 2010. Además, nuestro análisis sugiere que las estimaciones asumiendo la educación del jefe de hogar dan como resultado diferenciales de mortalidad atípicos, probablemente distorsionados por la transición de educación en Brasil. Esperamos que el modelo de imputación propuesto aquí se pueda utilizar en futuros análisis de mortalidad del Censo de 2010.


In this article, we estimate adult mortality by education level in São Paulo. We compare estimates based on deaths from the 2010 Census and the 2013 Mortality Information System (Sistema de Informação de Mortalidade - SIM) - DATASUS, and three different ways of measuring education level: recorded in the SIM, reported in the census for the household heads and imputed statistically in the census for individuals who died. For the statistical imputation, we use the Dempester (1977) method, which proposes using the expectation-maximization algorithm (EM algorithm) to deal with missing data. We consider three education levels (low, medium, and high) and estimate mortality rates based on Poisson models. The results indicate that between ages 25 and 59, more years of schooling are associated with mortality rates up to 77% lower. Secondary (medium) education level provides most of the mortality gains at adult ages (about 50%). The mortality differentials calculated with death records from the SIM and census deaths with education imputed statistically are similar. However, estimates based on the assumption that the deceased's education is equal to the household head's in the census resulted in atypical mortality patterns. We hope that the imputation model we propose in the current study can be used in future mortality analyses by SES using census deaths.


Subject(s)
Humans , Mortality , Censuses , Educational Status , Survivorship , Reference Standards , Algorithms , Brazil , Information Systems , Education, Primary and Secondary
3.
Chinese Journal of Epidemiology ; (12): 1563-1568, 2017.
Article in Chinese | WPRIM | ID: wpr-737874

ABSTRACT

Objective To compare results of different methods in organizing HIV viral load (VL) data with missing values mechanism. Methods We used software SPSS 17.0 to simulate complete and missing data with different missing value mechanism from HIV viral loading data collected from MSM in 16 cities in China in 2013. Maximum Likelihood Methods Using the Expectation and Maximization Algorithm (EM), regressive method, mean imputation, delete method, and Markov Chain Monte Carlo (MCMC) were used to supplement missing data respectively. The results of different methods were compared according to distribution characteristics, accuracy and precision. Results HIV VL data could not be transferred into a normal distribution. All the methods showed good results in iterating data which is Missing Completely at Random Mechanism (MCAR). For the other types of missing data, regressive and MCMC methods were used to keep the main characteristic of the original data. The means of iterating database with different methods were all close to the original one. The EM, regressive method, mean imputation, and delete method under-estimate VL while MCMC overestimates it. Conclusion MCMC can be used as the main imputation method for HIV virus loading missing data. The iterated data can be used as a reference for mean HIV VL estimation among the investigated population.

4.
Chinese Journal of Epidemiology ; (12): 1169-1173, 2017.
Article in Chinese | WPRIM | ID: wpr-737797

ABSTRACT

Objective To analyze the effect of missing data in population based viral load (PVL) survey in HIV infected men who have sex with men (MSM) sampled in 16 cities in China.Methods The database of 3 virus load sampling survey conducted consecutively in HIV infected MSM population in 16 large cities (Beijing,Shanghai,Nanjing,Hangzhou,Wuhan,Chongqing,Kunming,Xi' an,Guangzhou,Shenzhen,Narning,Urumuqi,Harbin,Changchun,Chengdu and Tianjin) during 2013-2015 was used.SPSS 17.0 software was used to describe distribution of the missing data and analyze associated factors.Results A total of 12 150 HIV infected MSM were randomly selected for the surveys,in whom,9 141 (75.2%) received virus load tests,while 3 009 (24.8%) received no virus load tests,whose virus load data missed.The virus load data missing rates in MSM with or without access to antiretroviral therapy (ART) were 11.5% (765/6 675) and 39.4% (2 060/5 223) respectively,and the virus load data missing rates were 21.9% (1 866/8 523) and 28.4% (959/3 374),respectively,in local residents and non-local residents (migrants).Conclusions The analysis indicated that the data missing occurred in the virus load survey in HIV infected MSM population.ART status and census registering status were the main influencing factors.Data missing could influence the accurate evaluation of community viral load (CVL) and population viral load (PVL) levels in HIV infected MSM in China.

5.
Chinese Journal of Epidemiology ; (12): 674-678, 2017.
Article in Chinese | WPRIM | ID: wpr-737705

ABSTRACT

To use a visualized method,tipping-point analysis to address missing data in clinical study and discuss related problems.All the possible outcomes caused by missing data were listed and the tipping points where P-values of hypothesis test changed at 0.05 significant level were found out,then the ratio of P<0.05 was calculated,reflecting the reliability of the study's result.Tipping-point analysis can be applied to both continuous and binary data to help find points where p-values are changed.The area of P<0.05 of continuous data is 93.6%,indicating that the reliability of success of the study is large;and the area of P<0.05 of binary data is 29.7%,reflecting that the reliability of success of the study is small.Tipping-point analysis,which provides a visualized evidence for decision making,is suitable for analyzing clinical studies with missing data.

6.
World Science and Technology-Modernization of Traditional Chinese Medicine ; (12): 1966-1975, 2017.
Article in Chinese | WPRIM | ID: wpr-696130

ABSTRACT

This article mainly introduces the functional clustering methods and demonstrates its performance by the real analysis of Chinese medical Zong Qi data.The functional clustering analysis hypothesizes that the discrete time series observations are dominated by a continuous function of time,which can be expressed by infinite basis functions.Functional clustering methods include raw data method,filtering method and adaptive method.When dealing with the sparse data clustering analysis,raw data method encounters the difficulty of matrix calculation due to the lack of data on some time grids.Filtering method suits for full time data,while when facing missing data,the fitting curve is inaccurate so that the clustering outcome cannot be explainable.Adaptive method can be applied flexibly to both full time and sparsely sampled data.In the real analysis section,the adaptive method is used to cluster the sparsely sampled Chinese medical Zong Qi time series data,where the elderly individuals are divided into three clusters,the ones with high level of Zong Qi,the ones with moderate level and those with low level.The adaptive method performs well on clustering individuals.

7.
Chinese Journal of Epidemiology ; (12): 1563-1568, 2017.
Article in Chinese | WPRIM | ID: wpr-736406

ABSTRACT

Objective To compare results of different methods in organizing HIV viral load (VL) data with missing values mechanism. Methods We used software SPSS 17.0 to simulate complete and missing data with different missing value mechanism from HIV viral loading data collected from MSM in 16 cities in China in 2013. Maximum Likelihood Methods Using the Expectation and Maximization Algorithm (EM), regressive method, mean imputation, delete method, and Markov Chain Monte Carlo (MCMC) were used to supplement missing data respectively. The results of different methods were compared according to distribution characteristics, accuracy and precision. Results HIV VL data could not be transferred into a normal distribution. All the methods showed good results in iterating data which is Missing Completely at Random Mechanism (MCAR). For the other types of missing data, regressive and MCMC methods were used to keep the main characteristic of the original data. The means of iterating database with different methods were all close to the original one. The EM, regressive method, mean imputation, and delete method under-estimate VL while MCMC overestimates it. Conclusion MCMC can be used as the main imputation method for HIV virus loading missing data. The iterated data can be used as a reference for mean HIV VL estimation among the investigated population.

8.
Chinese Journal of Epidemiology ; (12): 1169-1173, 2017.
Article in Chinese | WPRIM | ID: wpr-736329

ABSTRACT

Objective To analyze the effect of missing data in population based viral load (PVL) survey in HIV infected men who have sex with men (MSM) sampled in 16 cities in China.Methods The database of 3 virus load sampling survey conducted consecutively in HIV infected MSM population in 16 large cities (Beijing,Shanghai,Nanjing,Hangzhou,Wuhan,Chongqing,Kunming,Xi' an,Guangzhou,Shenzhen,Narning,Urumuqi,Harbin,Changchun,Chengdu and Tianjin) during 2013-2015 was used.SPSS 17.0 software was used to describe distribution of the missing data and analyze associated factors.Results A total of 12 150 HIV infected MSM were randomly selected for the surveys,in whom,9 141 (75.2%) received virus load tests,while 3 009 (24.8%) received no virus load tests,whose virus load data missed.The virus load data missing rates in MSM with or without access to antiretroviral therapy (ART) were 11.5% (765/6 675) and 39.4% (2 060/5 223) respectively,and the virus load data missing rates were 21.9% (1 866/8 523) and 28.4% (959/3 374),respectively,in local residents and non-local residents (migrants).Conclusions The analysis indicated that the data missing occurred in the virus load survey in HIV infected MSM population.ART status and census registering status were the main influencing factors.Data missing could influence the accurate evaluation of community viral load (CVL) and population viral load (PVL) levels in HIV infected MSM in China.

9.
Chinese Journal of Epidemiology ; (12): 674-678, 2017.
Article in Chinese | WPRIM | ID: wpr-736237

ABSTRACT

To use a visualized method,tipping-point analysis to address missing data in clinical study and discuss related problems.All the possible outcomes caused by missing data were listed and the tipping points where P-values of hypothesis test changed at 0.05 significant level were found out,then the ratio of P<0.05 was calculated,reflecting the reliability of the study's result.Tipping-point analysis can be applied to both continuous and binary data to help find points where p-values are changed.The area of P<0.05 of continuous data is 93.6%,indicating that the reliability of success of the study is large;and the area of P<0.05 of binary data is 29.7%,reflecting that the reliability of success of the study is small.Tipping-point analysis,which provides a visualized evidence for decision making,is suitable for analyzing clinical studies with missing data.

10.
Actual. psicol. (Impr.) ; 29(119)dic. 2015.
Article in Spanish | LILACS-Express | LILACS | ID: biblio-1505549

ABSTRACT

La mayoría de los datos en ciencias sociales y educación presentan valores perdidos debido al abandono del estudio o la ausencia de respuesta. Los métodos para el manejo de datos perdidos han mejorado dramáticamente en los últimos años, y los programas computacionales ofrecen en la actualidad una variedad de opciones sofisticadas. A pesar de la amplia disponibilidad de métodos considerablemente justificados, muchos investigadores e investigadoras siguen confiando en técnicas viejas de imputación que pueden crear análisis sesgados. Este artículo presenta una introducción conceptual a los patrones de datos perdidos. Seguidamente, se introduce el manejo de datos perdidos y el análisis de los mismos con base en los mecanismos modernos del método de máxima verosimilitud con información completa (FIML, siglas en inglés) y la imputación múltiple (IM). Asimismo, se incluye una introducción a los diseños de datos perdidos así como nuevas herramientas computacionales tales como la función Quark y el paquete semTools. Se espera que este artículo incentive el uso de métodos modernos para el análisis de los datos perdidos.


Most of the social and educational data have missing observations due to either attrition or nonresponse. Missing data methodology has improved dramatically in recent years, and popular computer programs as well as software now offer a variety of sophisticated options. Despite the widespread availability of theoretically justified methods, many researchers still rely on old imputation techniques that can create biased analysis. This article provides conceptual introductions to the patterns of missing data. In line with that, this article introduces how to handle and analyze the missing information based on modern mechanisms of full-information maximum likelihood (FIML) and multiple imputation (MI). An introduction about planned missing designs is also included and new computational tools like Quark function, and semTools package are also mentioned. The authors hope that this paper encourages researchers to implement modern methods for analyzing missing data.

11.
Journal of Central South University(Medical Sciences) ; (12): 1289-1294, 2013.
Article in Chinese | WPRIM | ID: wpr-440088

ABSTRACT

Missing data plagues almost all surveys and researches. The occurrence of missing data will cause losses of original sample information and undermine the validity of the research results to some extents, so researchers should attach great importance to this problem. In this article, we introduced 3 kinds of missingness mechanism, namely missing completely at random, missing at random, and not missing at random. We summarized some common approaches to deal with missing data, including deletion, weighting approach, imputation and parameter likelihood method. Since these methods had its pros and cons , we should carefully select the proper way to handle missing data according to the missingness mechanism.

12.
Korean Journal of Anesthesiology ; : 402-406, 2013.
Article in English | WPRIM | ID: wpr-27437

ABSTRACT

Even in a well-designed and controlled study, missing data occurs in almost all research. Missing data can reduce the statistical power of a study and can produce biased estimates, leading to invalid conclusions. This manuscript reviews the problems and types of missing data, along with the techniques for handling missing data. The mechanisms by which missing data occurs are illustrated, and the methods for handling the missing data are discussed. The paper concludes with recommendations for the handling of missing data.


Subject(s)
Bias , Handling, Psychological
13.
An. acad. bras. ciênc ; 83(1): 61-72, Mar. 2011. ilus, graf, tab
Article in English | LILACS | ID: lil-578282

ABSTRACT

Missing data is a common problem in paleontology. It makes it difficult to reconstruct extinct taxa accurately and restrains the inclusion of some taxa on comparative and biomechanical studies. Particularly, estimating the position of vertebrae on incomplete series is often non-empirical and does not allow precise estimation of missing parts. In this work we present a method for calculating the position of preserved middle sequences of caudal vertebrae in the saurischian dinosaur Staurikosaurus pricei, based on the length and height of preserved anterior and posterior caudal vertebral centra. Regression equations were used to estimate these dimensions for middle vertebrae and, consequently, to assess the position of the preserved middle sequences. It also allowed estimating these dimensions for non-preserved vertebrae. Results indicate that the preserved caudal vertebrae of Staurikosaurus may correspond to positions 1-3, 5, 7, 14-19/15-20, 24-25/25-26, and 29-47, and that at least 25 vertebrae had transverse processes. Total length of the tail was estimated in 134 cm and total body length was 220-225 cm.


Dados lacunares são um problema comum na paleontologia. Eles dificultam a reconstrução acurada de táxons extintos e limitam a inclusão de alguns táxons em estudos comparativose biomecânicos. Particularmente, estimar a posição de vértebras em séries incompletas tem sido feito com base em métodos não empíricos que não permitem estimar corretamente as partes ausentes. Neste trabalho apresentamos uma metodologia que permite estimar a posição de sequências médias preservadas de vértebras caudais no dinossauro saurísquio Staurikosaurus pricei, com base no comprimento e altura dos centros das vértebras anteriores e posteriores preservadas. Equações de regressão foram usadas para estimar essas dimensões para as vértebras médias e, consequentemente, para posicionar as sequências médias preservadas e para estimar o tamanho das vértebras não preservadas. Os resultados indicam que as vértebras caudais preservadas de Staurikosaurus corresponderiam às posições 1-3, 5, 7, 14-19/15-20, 24-25/25-26 e 29-47, e que pelo menos 25 vértebras possuíam processos transversos. O comprimento total da cauda foi estimado em 134 cm e o comprimento total do corpo em 220-225 cm.


Subject(s)
Animals , Dinosaurs/anatomy & histology , Paleontology/methods , Spine/anatomy & histology , Tail/anatomy & histology , Dinosaurs/classification , Fossils
14.
Rev. bras. epidemiol ; 13(4): 596-606, Dec. 2010. ilus, graf, tab
Article in Portuguese | LILACS | ID: lil-569101

ABSTRACT

INTRODUÇÃO: A perda de informações é um problema frequente em estudos realizados na área da Saúde. Na literatura essa perda é chamada de missing data ou dados faltantes. Através da imputação dos dados faltantes são criados conjuntos de dados artificialmente completos que podem ser analisados por técnicas estatísticas tradicionais. O objetivo desse artigo foi comparar, em um exemplo baseado em dados reais, a utilização de três técnicas de imputações diferentes. MÉTODO: Os dados utilizados referem-se a um estudo de desenvolvimento de modelo de risco cirúrgico, sendo que o tamanho da amostra foi de 450 pacientes. Os métodos de imputação empregados foram duas imputações únicas e uma imputação múltipla (IM), e a suposição sobre o mecanismo de não-resposta foi MAR (Missing at Random). RESULTADOS: A variável com dados faltantes foi a albumina sérica, com 27,1 por cento de perda. Os modelos obtidos pelas imputações únicas foram semelhantes entre si, mas diferentes dos obtidos com os dados imputados pela IM quanto à inclusão de variáveis nos modelos. CONCLUSÕES: Os resultados indicam que faz diferença levar em conta a relação da albumina com outras variáveis observadas, pois foram obtidos modelos diferentes nas imputações única e múltipla. A imputação única subestima a variabilidade, gerando intervalos de confiança mais estreitos. É importante se considerar o uso de métodos de imputação quando há dados faltantes, especialmente a IM que leva em conta a variabilidade entre imputações para as estimativas do modelo.


INTRODUCTION: It is common for studies in health to face problems with missing data. Through imputation, complete data sets are built artificially and can be analyzed by traditional statistical analysis. The objective of this paper is to compare three types of imputation based on real data. METHODS: The data used came from a study on the development of risk models for surgical mortality. The sample size was 450 patients. The imputation methods applied were: two single imputations and one multiple imputation and the assumption was MAR (Missing at Random). RESULTS: The variable with missing data was serum albumin with 27.1 percent of missing rate. The logistic models adjusted by simple imputation were similar, but differed from models obtained by multiple imputation in relation to the inclusion of variables. CONCLUSIONS: The results indicate that it is important to take into account the relationship of albumin to other variables observed, because different models were obtained in single and multiple imputations. Single imputation underestimates the variability generating narrower confidence intervals. It is important to consider the use of imputation methods when there is missing data, especially multiple imputation that takes into account the variability between imputations for estimates of the model.


Subject(s)
Humans , Epidemiologic Methods , Models, Statistical , Surgical Procedures, Operative/mortality , Risk
15.
Rev. bras. educ. fís. esp ; 24(3): 413-431, jul.-set. 2010. graf, tab
Article in Portuguese | LILACS | ID: lil-604579

ABSTRACT

O grande propósito deste texto é apresentar um tutorial para investigadores das Ciências do Desporto e da Educação Física acerca dos desafios que se colocam quando se analisa informação longitudinal. A partir de um exemplo com dados reais do estudo longitudinal-misto de Muzambinho percorrem-se três avenidas de preocupações: 1) a construção de um discurso desenvolvimentista com base na modelação hierárquica; 2) a apresentação de duas soluções para lidar com informação omissa; 3) a pesquisa sobre a estabilidade das diferenças interindividuais nas mudanças intraindividuais (i.e., do "tracking"). Em cada uma dessas avenidas são lançadas questões cujas soluções são sempre acompanhadas de leituras dos principais resultados dos distintos programas estatísticos utilizados.


The main aim of this study is to present a tutorial to Sport Sciences and Physical Education researchers when facing challenges emerging from longitudinal data analysis. Based on a real data set from Muzambinho mixed-longitudinal study, we shall deal with three main concerns: 1) to build a developmental view based on hierarchical or multilevel modeling; 2) to present two solutions to the missing data problem; 3) to search for stability of interindividual differences in intraindividual change (i.e., tracking). In each of these main issues questions will be asked whose answers will be presented alongside with main results coming from different statistical softwares used.


Subject(s)
Data Interpretation, Statistical , /statistics & numerical data , Research
16.
Genomics & Informatics ; : 129-132, 2007.
Article in English | WPRIM | ID: wpr-86062

ABSTRACT

arrayImpute is a software for exploratory analysis of missing data and imputation of missing values in microarray data. It also provides a comparative analysis of the imputed values obtained from various imputation methods. Thus, it allows the users to choose an appropriate imputation method for microarray data. It is built on R and provides a user-friendly graphical interface. Therefore, the users can easily use arrayImpute to explore, estimate missing data, and compare imputation methods for further analysis.

17.
Yonsei Medical Journal ; : 829-837, 2004.
Article in English | WPRIM | ID: wpr-203771

ABSTRACT

Missing data such as appropriateness ratings in clinical research are a common problem and this often yields a biased result. This paper aims to introduce the multiple imputation method to handle missing data in clinical research and to suggest that the multiple imputation technique can give more accurate estimates than those of a complete-case analysis. The idea of multiple imputation is that each missing value is replaced with more than one plausible value. The appropriateness method was developed as a pragmatic solution to problem of trying to assess "appropriate" surgical and medical procedures for patients. Cataract surgery was selected as one of four procedures that were evaluated as a part of the Clinical Appropriateness Initiative. We created mild to high missing rates of 10%, 30% and 50% and compared the performance of logistic regression in cataract surgery. We treated the coefficients in the original data as true parameters and compared them with the other results. In the mild missing rate (10%), the deviation from the true coefficients was quite small and ignorable. After removing the missing data, the complete-case analysis did not reveal any serious bias. However, as the missing rate increased, the bias was not ignorable and it distorted the result. This simulation study suggests that a multiple imputation technique can give more accurate estimates than those of a complete-case analysis, especially for moderate to high missing rates (30 - 50%). In addition, the multiple imputation technique yields better accuracy than a single imputation technique. Therefore, multiple imputation is useful and efficient for a situation in clinical research where there is large amounts of missing data.


Subject(s)
Humans , Cataract Extraction/methods , Logistic Models
18.
Japanese Journal of Pharmacoepidemiology ; : 71-82, 2001.
Article in Japanese | WPRIM | ID: wpr-376062

ABSTRACT

Quality of life (QOL) evaluated by patients themselves has become one of the important outcomes in clinical practice as well as clinical trials. Recently clinicians have attempted to gather QOL evaluation data in their clinical practice setting and integrate the findings into the medical decision-making process. To date, several multidimensional generic questionnaires consisting of multiple domains such as functional, physical, mental and social well-being, have been developed and utilized for generic QOL evaluation in clinical trials, especially in the oncology area. To develop a well-constructed and valid QOL questionnaire, its psychometric characteristics such as reliability, validity, responsiveness and feasibility must be adequately assessed in the research setting.<BR>In clinical trials, QOL data are generally measured in a longitudinal fashion and there are two prominent embarrassing statistical problems : one is the multiplicity due to replication (in time) of statistical tests and the other is the occurrence of missing data due to a variety of reasons. Non-random missing data which occurs because of any reasons related to a patient's present status and/or future prognosis possibly leads to bias and misinterpretation of the results of a trial. To solve the multiplicity problem, the repeated-measures ANOVA-type data analysis or summarization of a repeated measures into an appropriate summary measure can be applied. Missing data can be prevented to some extent by allocating/training coordinators at each participating institute and establishing a communication network between a data center and participating institutes. However, missing data will occur inevitably due to the deterioration of a patient's physical status in the area of life threatening diseases suchas advanced cancer or other diseases with poor prognosis. Although several statistical approaches to cope with missing data even including non-random one have been proposed, there is no single complete analytical solution that can handle the non-random missing problem. The best remedy would be to collect information about reasons why the missing data occurred so that we can identify the missing mechanism and take it into account in a statistical analysis. A so-called “sensitivity analysis” of comparing the results of several analytical methods suchas different imputation techniques or newly proposed ideas would also be a useful approach. The QALY (Quality Adjusted Life Year) used the idea of weighting life time by utility evaluated by patients themselves and is coined for incorporating a patient's judgment into the treatment selection. Ultimately, an assessment of QOL should be utilized for “individualized” or “tailor-made” treatment and statistical methodology should be developed further for gathering, analyzing and utilizing QOL data.

19.
Academic Journal of Second Military Medical University ; (12)2001.
Article in Chinese | WPRIM | ID: wpr-555432

ABSTRACT

Objective:To explore the results of different methods for managing multivariate missing data. Methods: Case deletion, simple imputation and multiple imputation were compared when used for analyzing the clinical data of 925 liver cancer patients with medium multivariate missing data. Results: There were differences among the 3 methods. When ?=0.05, the risk factors influencing patients' survival time were clinical staging,history of hepatic cirrhosis, portal vein tumor thrombas, and levels of g-GT and WBC with multiple imputation, and were TNM staging, lipiodol dose, AST and ALP with case deletion. The 3 more factors of simple imputation were TNM staging, ALP and AFP compared with multiple imputation. Conclusion: Simple imputation is superior to case deletion in management of multivariate missing data but tends to make standard error smaller and P value lower. Multiple imputation is more reasonable and scientific than the other 2 methods.

20.
Korean Journal of Preventive Medicine ; : 875-884, 1998.
Article in Korean | WPRIM | ID: wpr-199622

ABSTRACT

Missing observations are common in medical research and health survey research. Several statistical methods to handle the missing data problem have been proposed. The EM algorithm (Expectation-Maximization algorithm) is one of the ways of efficiently handling the missing data problem based on sufficient statistics. In this paper, we developed statistical models and methods for survey data with multivariate missing observations. Especially, we adopted the Em algorithm to handle the multivariate missing observations. We assume that the multivariate observations follow a multivariate normal distribution, where the mean vector and the covariance matrix are primarily of interest. We applied the proposed statistical method to analyze data from a health survey. The data set we used came from a physician survey on Resource-Based Relative Value Scale(RBRVS). In addition to the EM algorithm, we applied the complete case analysis, which used only completely observed cases, and the available case analysis, which utilizes all available information. The residual and normal probability plots were evaluated to access the assumption of normality. We found that the residual sum of squares from the EM algorithm was smaller than those of the complete-case and the available-case analyses.


Subject(s)
Biostatistics , Dataset , Health Surveys , Models, Statistical , Relative Value Scales
SELECTION OF CITATIONS
SEARCH DETAIL